Column Heterogeneity as a Measure of Data Quality
نویسندگان
چکیده
Data quality is a serious concern in every data management application, and a variety of quality measures have been proposed, including accuracy, freshness and completeness, to capture the common sources of data quality degradation. We identify and focus attention on a novel measure, column heterogeneity, that seeks to quantify the data quality problems that can arise when merging data from different sources. We identify desiderata that a column heterogeneity measure should intuitively satisfy, and discuss a promising direction of research to quantify database column heterogeneity based on using a novel combination of cluster entropy and soft clustering. Finally, we present a few preliminary experimental results, using diverse data sets of semantically different types, to demonstrate that this approach appears to provide a robust mechanism for identifying and quantifying database column heterogeneity.
منابع مشابه
A Study of Gas Flow in a Slurry Bubble Column Reactor for the DME Direct Synthesis: Mathematical Modeling from Homogeneity vs. Heterogeneity Point of View
In the present study, a heterogeneous and homogeneous gas flow dispersion model for simulation and optimization of a large-scale catalytic slurry reactor for the direct synthesis of dimethyl ether (DME) from synthesis gas (syngas) and CO2, using a churn-turbulent regime was developed. In the heterogeneous flow model, the gas phase was distributed into two bubble phases including small and large...
متن کاملReal-time quality monitoring in debutanizer column with regression tree and ANFIS
A debutanizer column is an integral part of any petroleum refinery. Online composition monitoring of debutanizer column outlet streams is highly desirable in order to maximize the production of liquefied petroleum gas. In this article, data-driven models for debutanizer column are developed for real-time composition monitoring. The dataset used has seven process variables as inputs and the outp...
متن کاملشیوع کیفیت خواب نامطلوب در دانشجویان دانشگاههای ایران: مرور ساختاریافته و متاآنالیز
Background: Students sleep pattern, due to the stress of studying and teaching workload are different with other non-student peers. The aim of this study was to determine the prevalence of poor sleep quality in college students of Iran by a meta-analysis study, to be as a final measure for policy makers in this field. Methods: In this meta-analysis study, the databases of PubMed, Science Direct...
متن کاملDeveloping a Model of Heterogeneity in Driver’s Behavior
Intelligent Driver Model (IDM) is a well-known microscopic model of traffic flow within the traffic engineering societies. While it is a powerful technique for modeling traffic flows, the Intelligent Driver Model lacks the potential of accommodating the notion of drivers’ heterogeneous behavior whenever they are on roads. Concerning the above mentioned, this paper takes the lane to recognize th...
متن کاملDimensionality analysis of subsurface structures in magnetotellurics using different methods (a case study: oil field in Southwest of Iran)
Magnetotelluric (MT) method is an electromagnetic technique that uses the earth natural field to map the electrical resistivity changes in subsurface structures. Because of the high penetration depth of the electromagnetic fields in this method (tens of meters to tens of kilometers), the MT data is used to investigate the shallow to deep subsurface geoelectrical structures and their dimensions....
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006